Searching the Annotated Portuguese Childes Corpora
نویسنده
چکیده
Recently there has been a growing number of initiatives for annotating children’s data for a number of languages, with for instance, part-ofspeech (PoS) and syntactic information (Sagae et al., 2010; Buttery and Korhonen, 2007; Yang, 2010) and some of these are available as part of CHILDES (MacWhinney, 2000). For resource rich languages like English these annotations can be further extended with detailed information, for instance, from WordNet (Fellbaum, 1998) about synonymy, from the MRC Psycholinguistic Database (Coltheart, 1981) about age of acquisition, imagery, concreteness and familiarity among others. However, for many other languages one of the challenges is in annotating corpora in a context where resources and tools are less abundant and many are still under development.
منابع مشابه
An Environment for searching Portuguese child language corpora
Language acquisition is the process by which humans acquire the capacity to perceive, comprehend and produce language. Considerable research effort has been devoted to examining information provided by the linguistic environment as well as possible algorithms that could learn from that. In this context, the availability of annotated resources on child language data opens up several avenues of e...
متن کاملA large scale annotated child language construction database
Large scale annotated corpora of child language can be of great value in assessing theoretical proposals regarding language acquisition models. For example, they can help determine whether the type and amount of data required by a proposed language acquisition model can actually be found in a naturalistic data sample. To this end, several recent efforts have augmented the CHILDES child language...
متن کاملMultiword Expressions in Child Language
The goal of this work is to introduce CHILDES-MWE, which contains English CHILDES corpora automatically annotated with Multiword Expressions (MWEs) information. The result is a resource with almost 350,000 sentences annotated with more than 70,000 distinct MWEs of various types from both longitudinal and latitudinal corpora. This resource can be used for large scale language acquisition studies...
متن کاملHigh-accuracy Annotation and Parsing of CHILDES Transcripts
Corpora of child language are essential for psycholinguistic research. Linguistic annotation of the corpora provides researchers with better means for exploring the development of grammatical constructions and their usage. We describe an ongoing project that aims to annotate the English section of the CHILDES database with grammatical relations in the form of labeled dependency structures. To d...
متن کاملAn annotated English child language database
The use of large-scale naturalistic data has been opening up new investigative possibilities for language acquisition studies, providing a basis for empirical predictions and for evaluations of alternative acquisition hypotheses. One widely used resource is CHILDES (MacWhinney, 1995) with transcriptions for over 25 languages of interactions involving children, with the English corpora available...
متن کامل